From Database to Treebank: On Enhancing Hypertext Grammars with Grammar Engineering and Treebank Search

نویسندگان

  • Emily M. Bender
  • Sumukh Ghodke
  • Timothy Baldwin
  • Rebecca Dridan
چکیده

This paper describes how electronic grammars can be further enhanced by adding machine-readable grammars and treebanks. We explore the potential benefits of implemented grammars and treebanks for descriptive linguistics, following the discursive methodology of Bird & Simons (2003) and the values and maxims identified by Nordhoff (2008).1 We describe the resources which we believe make implemented grammars and treebanks feasible additions to electronic descriptive grammars, with a particular focus on the Grammar Matrix grammar customization system (Bender et al. 2010) and the Fangorn treebank search application (Ghodke & Bird 2010). By presenting an example of an implemented grammar based on a descriptive prose grammar, we show one productive method of collaboration between grammar engineer and field linguist, and propose that a tighter integration could be beneficial to both, creating a virtuous cycle that could lead to more effective and informative resources.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Treebank vs. Xbar-based Automatic F-structure Annotation Treebank vs. Xbar-based Automatic F-structure Annotation

Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank) have been explored. The idea is to automatically \induce" or rather read oo (P)CFG grammars from the parse annotated treebank resources and to use the treebank g...

متن کامل

Treebank vs. Xbar-based Automatic F-Structure Annotation

Manual, large scale (computational) grammar development is time consuming, expensive and requires lots of linguistic expertise. More recently, a number of alternatives based on treebank resources (such as Penn-II, Susanne, AP treebank) have been explored. The idea is to automatically “induce” or rather read off (P)CFG grammars from the parse annotated treebank resources and to use the treebank ...

متن کامل

Comparing and integrating Tree Adjoining Grammars

Grammars are core elements of many NLP applications. Grammars can be developed in two ways: built by hand or extracted from corpora. In this paper, we compare a handcrajted grammar with a Treebank grammar. We contend that recognizing substructures of the grammars' basic units is necessary tures and semantic information which are rarely represented in the corpora. lt would be ideal if we could c...

متن کامل

An annotation scheme for Persian based on Autonomous Phrases Theory and Universal Dependencies

A treebank is a corpus with linguistic annotations above the level of the parts of speech. During the first half of the present decade, three treebanks have been developed for Persian either originally or subsequently based on dependency grammar: Persian Treebank (PerTreeBank), Persian Syntactic Dependency Treebank, and Uppsala Persian Dependency Treebank (UPDT). The syntactic analysis of a sen...

متن کامل

یک مدل بیزی برای استخراج باناظر گرامر زبان طبیعی

In this paper, we show that the problem of grammar induction could be modeled as a combination of several model selection problems. We use the infinite generalization of a Bayesian model of cognition to solve each model selection problem in our grammar induction model. This Bayesian model is capable of solving model selection problems, consistent with human cognition. We also show that using th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012